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Abstract — We  consider  the  problem  of  collaborative  target 
localization  by  several  observers,  called  players,  where  the 
reliability  of  each  player  is  unknown.  As  in  our  previous  work 
[1]  we  formulate  this  problem  as  a  20  questions  game  with  noise 
for  collaborative  players  under  a  minimum  entropy  criterion. 
We  extend  the  setting  of  [1]  to  the  case  where  the  players’ 
error  channels  have  unknown  crossover  probabilities.  First,  we 
use  dynamic  programming  to  characterize  the  structure  of  the 
optimal  policy  for  constructing  the  sequence  of  questions.  This 
generalizes  the  multiplayer  policies  derived  in  [1]  for  the  known 
error  channel  setting.  Second,  we  prove  a  separation  theorem 
showing  that  a  sequential  bisection  scheme  achieves  the  same 
performance  as  the  optimal  joint  queries.  This  generalizes  the 
separation  theorem  recently  derived  for  the  known  error  channel 
case  in  [1].  Third,  we  derive  bounds  for  the  maximum  entropy 
loss  per  iteration.  Finally,  we  show  that  even  for  the  one¬ 
dimensional  case,  the  optimal  query  policy  for  the  unknown  error 
channel  is  not  equivalent  to  a  probabilistic  bisection  policy.  This 
framework  provides  a  methodology  for  simultaneous  sequential 
estimation  of  target  location  and  learning  the  error  channels 
associated  with  the  players. 

I.  Introduction 

Consider  the  problem  of  estimation  of  an  unknown  target 
location  by  playing  20  questions  game  with  a  group  of 
sensors.  In  this  game,  sensors  are  repeatedly  queried  about 
target  location.  The  objective  is  to  optimize  the  sequence  of 
queries  when  the  accuracy  of  responses  of  the  noisy  oracles 
is  unknown,  i.e.,  unknown  error  channels.  This  is  especially 
relevant  to  the  case  of  human-in-the-loop  systems  where  the 
probability  of  correct  response  of  the  human  may  be  difficult 
to  predict  and  quantify. 

Sequential  estimation  of  target  position  was  studied  in  [2], 
for  the  single  player  setting,  in  the  context  of  a  noisy  20 
questions  game,  where  the  objective  was  to  minimize  the 
expected  entropy  after  N  questions.  In  the  collaborative  case 
[1],  a  controller  sequentially  poses  a  set  of  questions  about 
target  location  to  multiple  sensors  and  fuses  the  sensors’  noisy 
responses  to  formulate  the  next  questions. 

This  paper  focuses  on  the  unknown  error  channel  case. 
Our  approach  is  based  on  jointly  estimating  the  target  and 
the  error  channels  associated  with  the  players.  Using  dynamic 
programming,  we  characterize  the  optimal  policy  and  provide 
bounds  on  the  maximum  expected  entropy  loss  per  iteration. 

This  research  was  partially  supported  by  MURI  grant  W911NF-1 1-1-0391. 


We  also  derive  a  separation  theorem  that  shows  that  a  sequen¬ 
tial  bisection  scheme  achieves  the  same  expected  entropy  loss 
as  the  jointly  optimal  scheme. 

A.  Previous  Work 

The  paper  by  Jedynak  et  al.  [2]  formulates  the  single  player 
20  questions  problem  as  a  controller  querying  a  noisy  oracle 
about  whether  or  not  a  target  A*  lies  in  a  set  An  c 
Starting  with  a  prior  distribution  on  the  target’s  location  po(-), 
the  objective  is  to  minimize  the  expected  entropy  of  the 
posterior  distribution: 

infE  *[H(Pn)\  (1) 

7 r 

where  i r  =  (7To,  7Ti,  . . . )  denotes  the  controller’s  query  policy 
and  the  entropy  is  the  standard  differential  entropy  [3]  H(p )  = 
—  fxp(x)  \ogp(x)dx.  The  posterior  mean  or  median  p^  is 
used  to  estimate  the  target  location  after  N  questions.  The 
densities  /o  and  /i  correspond  to  the  noisy  channel  : 

P(Fn+1  =y\Zn  =  z)  =  f0(y)I(z  =  0)  +  fi(y)I(z  =  1) 

where  Zn  =  I(X *  G  An )  G  {0,1}  is  the  channel  input. 
The  noisy  channel  models  the  conditional  probability  of  the 
response  to  each  question  being  correct.  For  the  special  case 
of  a  binary  symmetric  channel  (BSC),  u*  =  1/2  and  the 
probabilistic  bisection  policy  [2],  [4]  becomes  an  optimal 
policy.  Thm.  2  in  [2]  shows  the  bisection  policy  is  opti¬ 
mal  under  the  minimum  entropy  criterion-i.e.,  Pn(An)  := 
fA  Pn{x)dx  =&  u*  G  argmaxuG[0;i]  4>(u),  where  (j)(u )  = 
H(f\u  +(l-u)fo)-uH(f1)-(l-  u)H(fo)  is  nonnegative. 

Recently,  Tsiligkaridis  et  al.  [1]  derived  optimality  con¬ 
ditions  for  query  strategies  in  the  collaborative  multiplayer 
case.  It  was  shown  that  even  when  the  collaborative  players 
act  independently,  jointly  optimal  policies  require  overlapping 
non-identical  queries.  A  sequential  bisection  policy  for  which 
each  player  responds  to  a  single  question  was  introduced  and  it 
was  proven  that  the  expected  entropy  reduction  for  the  jointly 
optimal  scheme  is  the  same  as  that  of  the  sequential  bisection 
scheme.  Thus,  while  the  jointly  optimal  scheme  might  be 
hard  to  implement  as  the  number  of  players  and  dimensions 
increase,  the  sequential  bisection  scheme  simplifies  the  con¬ 
troller  design  with  no  performance  degradation. 

The  function  1(A)  is  the  indicator  function  throughout  the  paper-i.e., 
1(A)  =  1  if  A  is  true  and  zero  otherwise. 
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II.  Notation  &  Assumptions 

In  this  paper,  we  adopt  the  setup  of  [1].  Assume  that  there 
is  a  target  with  unknown  state  X*  G  X  C  Md.  There  are  M 
collaborating  players  that  can  be  asked  questions  at  each  time 
instant.  The  objective  of  the  players  is  to  come  up  with  the 
correct  answer  to  a  kind  of  20  questions  game. 

In  the  joint  estimation  setup,  we  assume  that  the  controller 
design  queries  for  M  sensors  and,  after  querying,  the  responses 
are  fused  and  the  next  set  of  questions  is  formulated  (see 
Fig.  1).  Let  the  rath  player’s  query  at  time  n  be  “does  X* 
lie  in  the  region  A ^  C  RdT\  We  denote  this  query  as 
=  /(X*  G  A G  {0,1}  to  which  the  player 
yields  provides  a  noisy  response  £  {0, 1}.  The  query 

region(s)  chosen  at  time  n  depend  on  the  information  available 
at  time  n.  More  formally,  { Z ^  =  /(X*  G  A^)}  is  a 
predictable  stochastic  process  with  respect  to  the  filtration 
generated  by  { A lm)}  and  i.e.,  {A^}m  € 

Tn  :=  CT({4mXfc+il4m))}™  :  0  <  k  <  n  -  1)  for  n  e  N. 

At  each  iteration  a  current  best  target  estimate  Xn  of  X*  is 
produced  (which  is  an  Tn -measurable  random  variable). 

The  sequential  strategy  consists  of  sequentially  asking  play¬ 
ers  queries  and  using  the  intermediate  responses  to  refine  the 
posterior  (see  Fig.  2).  For  each  time  epoch,  indexed  by  n 
and  called  a  cycle,  the  controller  formulates  and  asks  the  M 
players  questions  Ant  =  Anj,  t  =  0, . . . ,  M  —  1.  Let  the 
mth  player’s  query  at  time  nt  =  (n,  £)  =  nm_i  be  denoted 
by  Znt  =  /(X*  G  Ant)  G  {0,1}  and  its  associated  noisy 
response  Ynt+1  G  {0, 1}.  The  query  region  Ant  chosen  at  time 
nt  depends  on  the  information  available  at  that  time.  More  for¬ 
mally,  define  the  multi-index  (n,  t)  where  n  =  0, 1, . . .  indexes 
over  cycles  and  t  =  0, . . . ,  M— 1  indexes  within  cycles.  Define 
the  nested  sequence  of  sigma-algebras  Gn,u  Gn,t  C  Gn+i,t+j’ 
for  all  i  >  0  and  j  G  {0, . . . ,  M  —  1  —  £},  generated  by  the 
sequence  of  queries  and  the  players’  responses.  The  filtration 
Gnj  carries  all  the  information  accumulated  by  the  controller 
from  time  (0,  0)  to  time  (n,  t).  The  queries  { An jt}  formulated 
by  the  controller  are  measurable  with  respect  to  this  filtration. 

Define  the  random  vector  e  =  (ei, . . . ,  cm)  £  [0, 1/2) M,  the 
joint  posterior  distributions  P(X*  =  x,  e*  =  e\JFn)  =  pn(x,e) 
and  P(X*  =  x,e*  =  e\Gn,t)  =pnt(x,e). 

For  sets  A  c  Md,  define  A1  =  A  and  A 0  =  Ac. 
Define  the  M-tuples  Yn+i  =  . . . ,  y£$)  and  An  = 

{An  \  ■  ■  ■ ,  An^}.  Given  the  responses  Yn+i,  the  posterior 
update  becomes  [1]: 


Fig.  1.  Joint  scheme  for  M  collaborative  players  responding  to  binary 
valued  queries  about  the  location  X*  of  an  unknown  target. 


We  make  the  following  assumptions  throughout  the  paper. 

Assumption  1.  (Conditional  Independence )  Assume  that  the 
players'  responses  are  conditionally  independent: 


M 


P(Y, 


n+1 


An,  X* 


C,rn)  =  P[  F(Y^I\A^,X*,e*)  (3) 


m=  1 


where 


lr^Tn+i|>ln  ,6  J  —  j  P(m),^Am) ,  *  A(m)\ 


G  A 


X* 

fW/yWl,*  Y*  d  A 

JO  VJn+l  Iem5  ^ 


(m) 

n 

(m) 


(4) 


Assumption  2.  (Memoryless  Binary  Symmetric  Channels ) 
Players’  response  channels  are  independent  (memoryless) 
binary  symmetric  channels  (BSC)  [3]  with  crossover  proba¬ 
bilities  em  G  [0, 1/2): 


hm\y{m)\em,A^)  =  /jmVm))  = 

where  m  =  1, . . . ,  M,  j  =  0, 1. 

Ill 


1  ^71 


y(m)  =  j 
y(m)  ^  j 


Noisy  20  Questions  with  Collaborative 
Players:  Unknown  Error  Channels 

We  consider  the  setting  where  the  error  probabilities  of  the 
M  players  are  unknown.  In  this  case,  the  Bayes  posterior 
update  (2)  is  not  well-defined,  so  the  probabilistic  bisection 
algorithm  cannot  be  directly  used.  In  the  generic  setup  of 
unknown  G  [0, 1/2)  with  no  a  priori  information,  a  joint 
scheme  is  to  estimate  the  target  X*  and  the  error  probabilities 


).  The  joint  posterior  distribution  of  (X*,  e*) 


Pn-\- 1  (*^5  c)  OC  IP(Yn_|_i  —  yn_|_i  |  An,  X  —  X:  €  —  6,  T n)Pn(A'> 

(2) 

Assuming  that  all  sensors  are  queried  in  sequence  starting 
from  m  =  1  and  ending  at  m  =  M,  the  posterior  updates 
(after  querying  the  (t  +  l)th  player)  become: 

Pnt+1{x,  e) 

OC  p (Ynt+l  =  ynt+1\AntiX*  =  x’et+ 1  =  et+1,gnt)pnt(x,e) 


€ 

is  considered  here  because  the  error  probabilities  em  are 
coupled  with  the  target  x  through  the  Bayesian  update  (e.g. 
see  (4)  and  (2)). 

A.  Joint  Query  Design 

We  consider  the  minimum  entropy  criterion  (1).  Since  the 
error  probabilities  of  sensors  are  unknown,  the  joint  policy 
derived  in  Thm.  1  in  [1]  is  no  longer  applicable  or  valid. 
Define  the  density  parameterized  by  e  =  (ei, . . . ,  cm)  £ 


F(Ynt+1\Ant,X*,e*+1,gnt)  = 


_\  f[t+1\Yr 


fo  \Vit+ 1  lG*+i)  ,  X*  ^  Antconditions  for  the  case  of  unknown  error  probabilities. 


nt+i  lct+l 


),x*eA 


[0,_1/2)m  and  i  =  (ii e  {0, 1}M  as  g(y|i,e)  = 
(m)(y(m)|em)-  Next,  we  derive  the  joint  optimality 


nM  £ 

m=lfi 
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Fig.  2.  Sequential  scheme  for  M  collaborative  players  responding  to 
binary  valued  queries  about  the  location  X *  of  an  unknown  target. 


Theorem  1.  (Jointly  Optimal  Policy,  Unknown  Error  Proba¬ 
bilities)  Let  Assumptions  1  and  2  hold.  Consider  the  problem 
(1),  where  the  joint  policy  is  made  up  of  the  query  regions  for 
the  M  sensors. 

1 )  Optimal  policies  An  =  ( An  ^ , . . . ,  A^1^)  at  time  n  satisfy: 


sup 


H  (  E  f'2 9{-\ht)Pn  (nxm)r™A de) 

\ie{0,l}M  Je~°  V  m  )  ) 

E  f1'2  H(g(-\i,e))pn  [n(A(m))Se)de  (5) 


=:G* 


2)  The  maximum  information  gain  at  time  n  is: 

M 

G*n=YnC(em)\Tn] 


(6) 


m=  1 


where  E[C(em)\Jrn]  =  Lf=0C(em)pn(em)dem. 

Proof:  1)  Optimality  conditions 
The  solution  of  (1)  yields  the  Bellman  recursion: 


Vn{Vn)  =  inf  E  [  Vra+ 1  [pn+ 1 )  |  Ara  =  A,  Fn] 

A 

Using  a  similar  argument  as  in  Thm.  2  in  [2],  the  optimal 
solution  at  time  n  is  given  by  maximizing  the  entropy  loss: 


G*n  =  sup /((X*,e*);  Yn+1|A„  =  A,Tn) 

A 

=  SUp  H ( Pn )  IE  [H (Pn+1 )  |  An  =  A,  T rx] 

A 

=  supi^(Yn+1|An,  2Fn)  —  E  [H(Yn+i)\X*,  e*,  An,  jFn\ 

(7) 

and  the  value  function  is  given  by  Vn(pn)  =  H(pn )  — 
J2k=n  Gk  for  n  <  N  and  VN(pN)  =  H(pN).  Rewriting  (7) 
inside  the  supremum,  we  obtain  (5)  [5]. 

2)  Bounds  on  Maximum  entropy  loss 

Note  that  the  second  term  in  (5)  is  independent  of  the  queries, 
so  the  supremum  can  be  restricted  to  only  the  first  term 
without  loss  of  generality.  This  follows  from  the  additivity 
of  the  entropy  of  a  product  distribution-i.e.,  H(g(-  |i,  e))  = 


Em=iiJ(/ir)(’le'"))  =  Em=i  hb(em)-  Using  part  1),  the 
capacity  formula  of  BSC  (C(em)  =  1  —  hb(em)),  and  the 
fact  that  the  uniform  distribution  maximizes  the  entropy  (see 
Ch.2  in  [3]),  the  maximum  entropy  loss  can  be  bounded 


as  G*  <  E  [ZmC(eri 


2F r, 


[5].  Using  the  concavity  of 


H(-)  along  with  Thm.  1  from  [1],  it  can  be  shown  that 

G*n  [5].  ■ 

1)  Lower  Bound  on  MSE  Performance:  The  maximum 
entropy  loss  derived  in  Thm.  1  is  used  next  to  provide  a  lower 
bound  on  the  MSE  of  the  joint  sequential  estimator. 


Theorem  2.  (Lower  bound  on  Joint  MSE )  Assume  H  (pf)  is 
finite.  Then,  the  joint  MSE  of  the  joint  query  policy  in  Thm.  1 
satisfies: 

TEexp  <E[||Xn-X*||^]+E[||en-e*||2]  (8) 

where  K  =  exp(2i7(po))  is  a  constant  and  Xn  = 
E[A*|7rn],en  =  E The  average  entropy  loss  after  n 
questions  is  Cn  =  ^  Gk- 

Proof:  The  proof  is  similar  to  the  proof  of  Thm.  3  in  [1] 
and  is  included  in  [5].  ■ 

2)  Discussion:  The  jointly  optimal  policy  derived  in  Thm. 
1  bears  some  similarity  with  the  jointly  optimal  policy  of  Thm. 
1  in  [1]  that  does  not  apply  to  the  case  of  unknown  channels. 
We  remark  that  in  the  unknown  channel  setting,  the  maximum 
entropy  loss  G*  given  in  (5)  is  not  time-invariant,  unlike  in  the 
case  of  known  error  channels,  in  which  the  maximum  entropy 
loss  was  the  sum  of  the  capacities  of  the  players’  channels 
G*  =  C(em).  This  observation  motivates  an  adaptive 
sensor  selection;  given  the  constraint  that  only  one  sensor  may 
be  queried  at  each  time  instant,  then,  unlike  in  the  known 
channel  case,  the  maximal  information  gain  may  be  obtained 
by  querying  different  sensors  at  different  time  instants  based 
on  the  collected  information. 


B.  Sensor  Selection  Scheme 

The  control  un  —  u  denotes  that  the  ^th  sensor  is  queried 
at  time  n  and  =  A  is  the  associated  query  region. 


Theorem  3.  (Sensor  Selection  Policy,  Unknown  Error  Prob¬ 
abilities)  Consider  the  problem  (1),  where  the  policy  is  made 
up  of  which  sensor  to  choose  and  the  associated  query  region. 
1)  At  time  n,  optimal  query  policies  satisfy: 


max  Gl 

1  <u<M 

rV2 


(  r1/ 2 

(u)  =  sup//  /  fi(-\eu)p(u\Al ,  eu)de 

A  \Jeu=0  i=Q 


-  /  YH(M-\eu))PA(A\eu)de 

Jeu=0  i=0 


(9) 


2)  At  time  n,  the  maximum  entropy  loss  is: 


G*  =  maxG*(w)  =  maxE[G(ew)|7rn] 

u  u 

Proof:  The  proof  follows  using  techniques  similar  to 
Thm.  1  and  is  included  in  [5].  ■ 
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The  optimal  policy  for  the  minimum  expected  entropy 
criterion  (1)  shown  in  Thm.  3  prescribes  to  use  the  sensor 
u  with  the  maximum  information  gain  (measured  through  the 
uih  sub-marginal  distribution  p(h\x:  eu)).  While  the  form  (9) 
bears  some  similarity  to  the  optimality  conditions  of  the  known 
error  models  (see  Thm.  1  in  [1]),  the  bisection  policy  is  no 
longer  optimal. 

1)  One-dimensional  Case:  The  next  corollary  specifies 
the  form  of  the  optimal  policy  derived  in  Thm.  3  for  one¬ 
dimensional  targets.  For  simplicity,  consider  the  unit  interval 
X  =  [0, 1]  as  the  target  domain. 

Corollary  1.  (Sensor  Selection  Policy,  Unknown  Error  Prob¬ 
abilities,  One -dimensional  Target )  Consider  the  problem  (1) 
for  the  optimal  sensor  and  query  selection  policy.  Consider 
the  query  regions  An  =  [0,xn\.  The  optimal  sensor  u  and 
associated  query  region  A  =  [0,  x]  at  time  n  is  given  by: 

maX  {  60)  -  Cn]  )  (10) 

u  (ze[o,i]  ’  J 

where  /£#(• )  is  the  binary  entropy  function  [3]  and 

,1/2 

4“  =  /  hB(eu)p^\eu)deu 
Jeu=0 

9l)n(X)=  (  9n\t)dt+  f  (pn{t)  -  ^\t))dt 
JO  Jx 

,1/2 

d’(n\t)=  eupG\t,  eu)deu 

Jeu= 0 

r  1/2  1*1/2 

pn(t)=  ■■■  pn(t,e1,...,eM)de1---deM 

7e1=0  J  £m= 0 

Proof:  The  proof  follows  from  Thm.  3  [5].  ■ 

We  note  that  the  optimal  policy  derived  for  the  case  of 
unknown  probability  in  (10)  is  not  equivalent  to  the  prob¬ 
abilistic  bisection  policy-i.e.,  obtaining  P^QO, x^])  =  1/2 
for  each  sensor  u  and  then  evaluating  the  information  gain  and 
choosing  the  sensor  with  the  maximum  information  gain.  This 
heuristic  scheme  would  yield  a  suboptimal  information  gain 
as  compared  to  the  maximal  information  gain  given  by  (10). 
Thus,  in  the  unknown  probability  setting,  the  optimal  control 
law  is  no  longer  equivalent  to  the  known  probability  setting 
(after  marginalizing  out  the  noise  parameters  , . . . ,  e^). 
This  result  shows  that  the  two  settings  are  quite  different. 
We  empirically  observed  that  there  is  a  unique  query  point 
x  —  x*  =  Xn  ^  that  maximizes  the  function  (10).  This  is 
similar  to  the  one-dimensional  case  for  the  known  channel 
setting  when  the  query  region  is  of  the  form  A  =  [0,  x];  i.e., 
the  optimal  point  is  the  (unique)  median. 

C.  Sequential  Query  Design 

In  this  section,  we  show  a  version  of  the  separation  theorem 
(Thm.  2  in  [1])  for  the  unknown  error  channel  case. 

Theorem  4.  (Separation,  Unknown  Error  Probabilities)  Con¬ 
sider  the  sequential  and  joint  schemes.  Then,  it  follows  that 
G*seq,n  =  E[£m  C(em)\gn]  and  G*  =  E[£m  C(em)\Tn]  for 
all  n. 


Proof:  After  querying  all  M  players  in  sequence,  using 
the  tower  property  of  expectation  and  Thm.  1  with  Mm  1  for 
each  sub-instant  nt,  the  maximal  entropy  loss  can  be  shown 

t0  be  G*seq,n  =  ElH  (Pn)  ~  H (pn+1)\Gn]  = 

EEl,  C(ern)\Qn]  [5].  The  second  part  follows  from  Thm. 
1  part  2).  ■ 

IV.  Simulation 

Fig.  3  shows  a  simulation  result  of  the  MSE  performance  for 
M  =  1  sensor  with  unknown  error  probability.  This  simulation 
implies  that  the  binary  responses  obtained  from  one  player 
carry  enough  information  to  accurately  estimate  both  the  target 
and  its  error  probability. 


Fig.  3.  Monte  Carlo  simulation  for  MSE  performance  of  the  joint  sequential 
estimator  (of  the  target  X *  and  the  error  probability  e*).  The  MSE  for  X  is 
shown  on  the  left  and  MSE  for  e  on  the  right,  as  a  function  of  iteration.  100 
Monte  Carlo  trials  were  used.  The  true  error  probability  was  set  to  e*  =  0.3 
and  the  true  target  location  was  X*  =  0.75.  The  initial  distribution  was  a 
product  of  uniform  distributions  po(x)  =  I(x  E  [0, 1])  and  po(e)  =  I(e  E 
[0,  l/2))-i.e.,  po(x,e)  =po(x)pQ(e). 


V.  Conclusion 

We  studied  the  problem  of  collaborative  20  questions  with 
noise  for  the  multiplayer  case  under  unknown  error  channels. 
In  this  setting,  we  characterized  jointly  optimal  policies  and 
derived  a  separation  theorem  that  shows  the  jointly  optimal 
design  is  equivalent  to  a  sequential  bisection  design  that  can 
be  more  easily  implemented.  Simulations  were  provided  to  nu¬ 
merically  evaluate  the  performance  of  the  proposed  sequential 
estimator.  Future  work  may  include  cost  constraints  associated 
with  the  use  of  sensors. 
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